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( 54) A method and apparatus lor interactive language instruction 

viding m& audible speech to a user or student (with the 
aid ot an animated image in selected arcumstances). 
prompting me studeni to replicate the audible speech, 
comparing the students replication with me audible 
speech provided by the system, and providing feedback 
and reinforcement to the student by. for example, selec- 
tively recording or placyinfl DacKthe audible speech and 
the student's replication. 



(57) A method and apparatus tor interactive lan- 
guage instruction is provided mat displays textiles for 
processing, provide key features and functions tor inter- 
active learning, displays facial animation, and provides 
a workspace for language building functions. The sys- 
tem includes a stored set ot language rules as part of 
tne lext-io-speech sub-system, as well as anotner 
stored set of rules as applied to the process of learning 
a language. The method implemented by the system 
,ncludes digitally converting text to audible speech, pro- 
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Description 

pn CK g rounrt Of The IfWe ffiKlQ 



[0001] This invention relates to a method ana appa- 
ratus tor interactive language instruction. More particu- 
tany me invention is directed to a multi-media and multi- 
modal computer application mat displays text files Tor 
processing, provides features and functions for interac- 
tive learning, displays facial animation, and provides a 
workspace tor language building functions. Trie system 
includes a stored set ol language rules as pan of tne 
xext-to-speecn sub-system, as wen as another stored 
sei of rukw applied to the process of teaming a lan- 
guage. The method implemented tsy the system 
includes digitally convening text to audtole speech, pro- 
viding tne audlDle speech to a user or student (w*flh the 
aid of an animated image in selected circumstances), 
prompting me student to replicate tne audible speech, 
comparing the students replication wfch the audible 
speecn provided by the system, conducting perform- 
ance analysis on me speech (utterance) and providing 
feedback ana reinforcement to me student by, for exam- 
pie, selectively recording or playing back the audioie 
speech and me students repiication- 
[0002] While me invention is particularly directed to 
tne art of interactive language instruction, and will be 
thus described witn specific reference thereto, it will be 
appreciated thai me invention may nave usefulness in 
other fields and applications. For example, tne invention 
may be used to teach general speech skiUs to individu- 
als witn speech challenges or may be used to train sing- 
ers to enhance vocal skills. 

[0003] Py way of background, interactive language 
instruction programs are known. For example. U.S. Pat- 
ent No. 5,634.086 to Ftoscnev et at. is directed to a spo- 
ken language instruction method and apparatus 
employing context oased speecn recognition tor instruc- 
tion and evaluation. However, sucn known language 
instruction systems require tne use of recorded speech 
as a mooel witn which to compare a students attempts 
to speak a language sought to Oe learned. 
[0004] Work involved with preparing the lesson as 
recorded speech (such as preparing a script) includes 
recording phrases, words, etc,, creating illustrations, 
photograph*, video, or other media, and linking me 
sound files with the images and with the content of me 
lessons or providing large databases of alternative 
replies in dialogue systems which are designed to repli- 
cate interactions with students Tor context based les- 
sons, etc, 

[0005] Moreover, language students may pe inter- 
ested in learning words, phrases, and context ot a par- 
ticular interest such as industry specific terms from their 
workplace (computer industry, communications, auto 
repair, etc.). Producing such special content rs difficult 
using recoroed speech forme language lesson. 



m this context are numerous. The quality of me record- 
ing medium may present problems, in this regard, an 
excessive amount ot background noise m tne recording 
may afreet tne quality mereof. in addition, recorded 
s speech is suDject to many omer factors thai may unde- 
sirably enter me speech model For example, recorded 
speech may include speaker accents resulting from me 
speaker oemg a native of a particular geographic area. 
Likewise, recorded speech may reflect a particular emo- 
10 tional state of me speaker such as whether speaker is 
tired or upset. As a result, in any ot these circum- 
stances, as well as others, me shortcomings of 
recorded speech make it more difficult tor a student to 
learn a language lesson. 
is [0007] A tew products exist whicn aitow usens to 
process files of text to be read aloud by synthesized or 
recorded speech technologies. These products are 
commonly known as text-to-speech engines, 5fiS» for 
example, U.S- Patent No. 5.751,907 to Moebius et al. 
so (issued May 12, 1998) and U.S. Patent No 5,790,978 to 
Olive et al. (issued August 4. 1998), bom ot whicn are 
incorporated herein oy reference. Some existing prod- 
ucts also allow users to add words to a dictionary, make 
rnodiheations to word pronunciations in the dictionary, 
25 or modify the sound created by a text-to-speecn engine. 
See. for example, EP application no: 00303371.9. 
[0Q08] voce or speech recognition systems are 
also known. These systems use a variety of techniques 
for recognizing speech patterns including utterance ver- 
so tfication or verbal information verification (ViV), for 
which a variety of patents owned by l-ucent Technolo- 
gies have oeen applied for an/or issued. Among these 
commonly assigned patents/applications are US. Pat- 
ent No. 5,797.123 to Chou el aj. (filed Decemoer 20. 
35 1996; issued August 18. 1938); EP-A-892 387; EP-A- 
892 388; and u.S, Patent No. 5 r 649,067 to tee et al. 
(fued January 16, 1996; issued Jury 15, 1997). 
[0009] it would be desirable to nave available an 
interactive language insiruct.cn program mat did not 
rely exclusively on recorded speech and util&od reliable 
speecn recognition technology, such as mat which 
incorporates utterance verification or vertoal information 
verification (VIV), ft would also be desirable to evaluate 
a speakers utterance with predictive models in the 
45 absence ot a known model. The system would provide 
a confidence measure against any acoustic model from 
which a score can be derived, tt would also De desirable 
to have available such a system mat selectively incorpo- 
rates facial animation to assist a student in me learning 
so process. 

[0010] The present invention contemplates a new 
and improved interactive language instructor which 
resolves me above-referenced difficulties and others. 



ss Summary Of The Invention 

[0011] A method and apparatus for voce interactive 
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[0012] in one aspeci of me invention, a system 
comprises a first module configured to digitally convert 
,nput text to audible speech in a selected language, a 
user interface positioned to receive utterances spoken 
D y a user m attempting to replicate me audible speech, 
ana a second module configured to recognize tne utter- 
ances and provide feedback to me user as to an accu- 
racy at which tne user replicates me speecn in me 
selected language Dased on a comparison or the utter- 
ances to me audible speech, any acoustic model, pre- 
dictive models, phoneme models, diphone models, or 
dynamically generated models. 
[0013] in a more lunited aspect ot the invention, a 
tnird module is provided which is synchronized to tne 
first module and whicn provides an animated image of a 
numan face and head pronouncing we aualoie speech. 
[0014] in anomer aspeci of the invention, me ani- 
mated image of the face and human head portrays a 
transparent face and head. 

[0015] in another aspect of the invention, the ani- 
mated image of tne face and human head portrays a 
tnree dimensional perspective and me image can be 
rotated, tilted, etc for full view from various angles. 
[0016] m another aspeci of the invention, the first 
and third modules further include controls to control one 
of volume, speed, and vocal characteristics of the audi- 
ble speech and the animated image. 
[O017J in anomer aspect of tne invention, tne mooet 
is one ot a predictive model, phoneme model, a dfchone 
model, and a dynamically generated model. 
(0018] m another aspect of me invention, the first 
module includes files storing model pronunciations for 
the words or sub-words comprising the input text 
[0019] in another aspect of me invention, the sys- 
xem comprises lesson files upon which the input text is 
Dasea. 

[0020] m anomer aspect of tne invention, tne input 
text is Dased on aata received from a source outside of 
trie system. 

[0021] in anomer aspect of trie invention, the sys- 
tem further includes dictionary files. 
[0022] m another aspect of the invention, the sys- 
tem further comprises a record and playback module. 
[0023] In another aspect of me invention, me sys- 
tem includes a table storing mapping information 
between word subgroups and vocabulary words. 
[0024] m anomer aspect of the invention, the sys~ 
tern includes a taDie for storing mapping information 
between words and vocabulary words- 
[0025] In anomer aspect of the invention, tne sys- 
tem includes a table for storing mapping information 
between words and examples of parts of speecn, 
[0026] in another aspect of the invention, me sys- 
tem includes a table of punctuation. 
[0027] m anomer aspect of the invention, the sys- 
tem includes a table of sub-words and corresponding 
sub-words in anomer language. For word sound ori«, for 
example, when learning a first language (given a stu- 



dent wno natively speaxs a second language), sub- 
woros from me first language may be mapped to sub- 
woros in me second language, to illustrate sound alike 
comparison to tne student- The sup-word table will also 
5 oe used to locate and dispiay/play vocabulary words 
usmg me sub-word from eitner language. 
[0028] in anomer aspect of tne invention, a method 
is provided mat includes converting input texi data to 
audiDie speecn data, generating audioie speecn com- 
70 pnsmg phonemes or diphones based on me audible 
speecn data, generating an animated image of a face 
and head pronouncing me audible speecn. synchroniz- 
ing me audtele speech and the animated image, 
prompting me user to attempt to replicate me audible 
,s speech, recognizing utterances generated by me user 
m roeponsc w> ma prompt, comparing the phonemes or 
diphones to me utterances, and providing feedback to 
me user based on me comparison. 
[0029] in anomer aspect ot the invention, a series of 
so sentences is provided which represent me basic inven- 
tory of phonemes and diphones in a language. The stu- 
dent will read me sentences and they will be recorded. 
The sub-words win be analyzed to determine paseime 
score or starting performance of the student. This may 
2$ be used to determine progress, to establish a level for 
exercises, or to "identify areas to work on. 
[0030] in anomer aspect of tne invention, a table ot 
reterence scares is provided tor grade levels in lan- 
guage classes given populations of students. The stu- 
30 dent progress can be measured and graded on an 
individual basis or as compared with the population of 
choice- 

[0031] in another aspect of the invention, a score 
tor students speech will be provided in sub-words. 
35 words, sentences, or paragraphs. Student can receive 
an overall score, or a score on individual pans of me 
speech. 

[0032] m anomer aspect ot me invention, normali- 
zation issues regarding verification of speech are man- 
40 aged through the interface. Given speecn of differing 
duration, and complexity, me animated cursor on me 
screen can be set by me system or by the student. 
When me student reads along w,th tne animated cursor, 
me verification process can correlate me text which .s 
«5 highlighted with me sound file to be analyzed- 

[0033] in anomer aspect of me invention, certain 
recorded sounds can be interjected for emphasis ot nat- 
ural sound for known sub-words or words of a given lan- 
guage. These words may oe taken from a previously 
50 recorded dictionary application, or otner resource. 
[0034] in anomer aspect pt me invention, baseline 
scores are recorded m a taple. The table is used to 
determine appropriate level ot lesson to be selected for 
me student. With this table, the system can automati- 
ss cally use the same text, content, etc. Tor students of dif- 
ferent abilities by modifying thresholds of confidence 
measurement. 

[0035] in anomer aspect of me invention, me 
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teacher or siuclem can use me graphical user interface 
io establish or modify thresholds tor the confidence 
measurement, grade level or other attributes. 
[0036] in another aspect of the invention, the stu- 
dent registers identification, baseline score, and subse- 
quent lesson scores to achieve customized lessons and 
io irack progress. 

[0037] Further scope of the applicability oi me 
present invenuon wtu become apparent from trie 
detailed description provided beiow, it Should oe under- 
stood, however, that me aetaaed description and spe- 
cific examples, while indicating preferred embodiments 
of the invention, are given by way of illustration only. 
Since various cnnngos and morfficaaons wfchin me 
spirit and scope ot the invention will become apparent to 
those skilled in the an. 

p^ription Of The Drawings 

[0038] The present invention exists in the construc- 
tion, arrangement, and combination of the various parts 
of the device, and steps of the method, whereby tne 
objects contemplated are attained as hereinafter more 
fully set forth, specifically pointed out In the claims, and 
illustrated in the accompanying drawings In which: 

Figure 1 is a schematic illustration of a system 
according to tne present invention; 

Figure 2 is an illustration of a window generated to 
facilitate interactive learning according to the 
present invention; 

Figure 3 is a flowcnart of the overall method accord- 
ing to the present invention: 

Figure 4 is a detailed fiowcnart of a text selection 
and audiote speech generation metnod according 
io the present invention; 

Figure 5 is a detailed flowchart of a text selection, 
animanon and audible speech generation method 
according to tne present invention; 

Figure 6 is a detailed flowchart of a recording 
metnod according to the present invention; 

Figure 7 is a detailed flowchart of another recording 
method according to the present invention; 

Figure 8 is a detailed tlowcnaft of a playDack 
method according to the present invention; 

figure 9 is a flowchart illustrating a student registra- 
tion method according to tne present invention; 

Figure 10 is a flowchart showing a grade level eval- 



invention; and, 

figure msa flowchart showing a scoring example 
according to the present invention. 
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[0033] Refernng now to tne drawings wherein ^e 
snowmgs are Tor purposes of Illustrating tne preferred 
embodiments of tne Invention onty ana not for purposes 
ot limiting same, Figure l provides a view of tne overall 
preferred system according to tne present invention. As 
shown, an interactive language instruction system no is 
provided. The system 10 includes a computerized 
apparatus or system 12 having a microcontroller or 
microprocessor 14- The system 10 further has one or 
more "input devices 16 such as a keyboard, mouse, etc , 
a microphone ia, an input link 20, one or more display 
devices 22, an audio speaker 24 and an output Me inter- 
face unit 26. All such components are conventional and 
Known to those ot skill in the art and need not Oe further 
described here. Moreover, it should be appreciate©: that 
me system 10 in suitable form may also oe incorporated 
in and/or companDle with client-server and stim client 
architectures, it is to be further appreciated tnat the sys- 
tem could oe provided and deliverable through compact 
disks, the internet, or downloadable to a smaller or 
more mobile device. 

[0040] The system 12 includes a variety of compo- 
nents which may De incorporated therein as shown or 
may be remotely located from computer 12 and acces- 
sible over a networK or other connection in accordance 
w«n the present invention, as shown, the system 10 
includes a text-to-speech module, or TTS module, 30 
and an automated speech recognition module, or ASR 
module, 32, These modules are conventional and 
known to mose of skin »n the art- Preferably, tne TTS 
module 30 incorporates teachings of, for example, u.s 
Patent No 5.751 ,907 to Moebius etal. ("issued May 12, 
1998) and U.S. Patent No. 5,790.978 to Olive et ai. 
(issued August 1 1 . 1998). and the ASR module (includ- 
ing me verbal information verfficauon portion 32e) incor- 
porates, for example, the teachings of U.S. Patent No. 
5,797,123 to Chou et aJ. (filed December 20. 1996; 
issued August 18, 1998); EP-A-892 387 Ai; EP-A-892 
388; and, US. Patent No. 5,649.057 to uee et al. 
(issued July 15 1997). 

[0041] The TTS module 30 converts text stored as 
digital data to audio signals for output by the speakers 
24 in the form of phonemes and tne ASR module 32 
converts audio signals received through microphone 1 8 
into digiial data. Also provided to the system is an ani- 
mation module 34- 

(0042] The TTS module 30 has associated tnerwith 
a rules module 36 for facilitating tne conversion of the 
text to audible speech More specifically, the rules mod- 
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analysts of ine words for wh»ch conversion to audible 
speech is sought The rules module sequentially ana- 
lyzes a selected word, analyzes the worn in the context 
or trie sentence (e.g. analyzes tne surrounding words or 
me part ot speech (e g. determines whether 'address 
,s a noun or a vert))), and men analyzes me sentence 
format (e.g. determines wheiher me sentence is a ques- 
tion or a staiement). This analysis scheme facilitates a 
more accuraxe pronunciation ot each word (e.g. proper 
emphasis) in me context m which me wora is used. Tne 
TTS module 30 is also m communication with a diction- 
ary file or recorded dictionary 33 to facilitate proper 
nunciation ot selected words and, of course, a lesson 
f,le 40 from when text tor lessons is retrieved, it is to pe 
appreciated me lesson text may also be ©warned 
tnrougn input link 20 from vahous omer sources incma- 
ing the Internet. LANs. WANs, scanners, closed caption 
devices, etc. This feature allows me conient ot me les- 
sons to oe separated from the functions ot the system. 
That is me system and method of me present invention 
can be applied to efferent lesson content to suit the 
needs and/or desires ot me user or student 
[00431 The preferred TTS module or engine 
includes therein model pronunciations of all words and 
sub-worts entered in text These model files are ulti- 
mately used to compare with the words spoKen by me 
student, as will be described in greater detail below. Any 
word in a dictionary or file can be used with the system 
of me present invention, in omer language learning 
products lessons are limited to the recorded words and 
phrases The preterred TTS module provides the capa- 
bility to recognize text or a text file and process fc con- 
verting it to audible speech The preferred addition of a 
TTS module provides flexibility for lesson production in 
thai it repurposes other materials, current events, news 
atones, weo content, tues. specialized documents, etc. 
and provides me ap«ity to apply the application to spe- 
cial needs situations such as speech tnerapy where 
customized word practice is desirable. 
[0044] For a language student, the preterred TTS 
module provides examples of text pronounced accord- 
ing to the rules of speech in that language. The vocal 
quality of the TTS module is preferaoiy extremely high. 
The student can thus listen to the pronunciations of any 
word in the dictionary o r tile even when a native English 
speaker is not available, and without requiring mat the 
words used in lessons pe previously recorded and 
stored in me system inflections and tonal vanaitons 
common to language in context are included m the sys- 
tem which would be difficult to do with recorded speech 
The TTS module also accommodates regional accents 
through the addition of specific pronunciation tiles which 
may oe used in a specific context to demonstrate pro- 
nunciation alternatives including but not Omitea to: 
American regional American. English pronunciation ot 
Spanish words, proper names, trademark and technical 
words, etc . 
[WAS] The ASR module 32 includes a verbal infor- 



mation verification (VlV> portion 32a for providing utter- 
ance verification to me ASR module 32. This preferred 
form of the ASR module having the verbal information 
verification 0>iV) portion compares the output ot pho- 
nemes processed by the TTS engme and voice, its own 
acoustic model or any derived acoustic model, or utter- 
ances spoKen by me student The VIV portion analyzes 
the similarity with which a speaker matches the file cre- 
ated by tne TTS module 30. This comparison provides 
me basis of me feedback to the student. An overall 
score is offered to the student for feedback. In addition, 
individual word parts or phoneme matches are ana- 
lyzed to indicate where precisely the student may be 
naving difficulty in pronunciation. Feedback is provided 
75 to me student for each portion of me speech created. 
Reinforcement for pronunclaiion is pmv.oed to the stu- 
dent based upon rules of the language, identification ot 
the word or word segment identified, known pronuncia- 
tion problems carried from the students native lan- 
20 guage. and the students level of achievement, 

[00461 The animation module 34 provides visual aid 
to a student The module 34 *s synchronized with the 
TTS module 30, retrieves text files ana together with 
me TTS module or engine, pronounces tne word tor the 
25 student through an animated image of a human head 
and face. Preferably, the animated image of me face 
and human head portrays a mree-d»mens.onal perspec- 
uve and tne image has the capability of being rotated, 
tilted etc. tor full view horn various angles. Accordingly, 
so me student can observe charactenstics of facial and 
mourn movements, and placement ot the tongue, lips 
and teeth Ounng speech examples. The animation mod- 
ule synchronizes the facial movement with processing 
of the TTS module in manners that are well known to 
as those of skill m the art- The student can observe me ani- 
mated image, or teacher, from any angle, wim normal or 
transparent mooe to further observe teem and tongue 
placement The teacher example can be modified, vol- 
ume, speed, and vocal characteristics of the teacher 
<o may oe changed by the student using the computer 
interface. Voice may be male or female, nigh or tower, 
tast or slower. As will be described hereafter, reinforce- 
ment will be provided to the student based upon rules of 
the language, known pronunciation problems earned 
<5 from the students native language and the students 
level of achievement 

[0 O471 The system 10 also includes a workspace 
module 42 that generates and facilitates processing m a 
viewable workspace on the display device 22. The wonc- 
s» space module 42 is linked to a pronunciation module 
44. mapping tables 46, word processing module 48 and 
record and playback module 50. 
[0048] The pronunciation module 44 includes data- 
bases containing records for words, word subgroups. 
55 vocabulary words used to teach typical sounds in a lan- 
guage, examples from parts of speech used to teach 
contextual pronunciation ot words and tables ot punctu- 
ation. The sample words are selected in creating the 
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pronunciation databases based on grammatical and lin- 
guae rules for the language. Preferably, me sample 
words for each cnaracter or character group (e g. dip- 
mong) are ordered generaity from more common usage 
m pronunciation or the cnaracter to a less common 
usage. Tne module 44 also accommodates regional 
accents mrougn tne addition of specific pronunciation 
files wnicn may oe used to establish a profile in a spe- 
cific context to demonstrate pronunciation alternatives 
including but not limited to. American, regional Amen- 
can, English pronunciation ot Spanish words, proper 
names, trademark and technical words, etc. 
[0049] The mapping rabies 46 include tables 46a 
having stoied therein mappings between the word sub- 
groups and me vocabulary words used to teach typical 
sounds in a language, tables 46b naving stored therein 
mappings between the words and the vocabulary words 
used to teach typical sounds in a language, and tables 
A6e naving stored therein mappings between the words 
and tne examples from parts of speech to teach conies 
lual pronunciation of words. Tne system also includes 
tables 46d storing examples ot punctuation typically 
used in a language mat may be used in lessons inde- 
pendently, or m the context of a sub-word, word, or 
group. 

[0050] Referring now to Figure 2, a representative 
view of tne primary window generated by the system 1 0 
and appearing to the user is shown. Tne window 60 
includes a workspace 62 associated with the workspace 
module 42, a feedback area 64 primarily associated 
with the ASR module, an animation area 66 primarily 
associated with tne animation module, and a control 
area 68 primarily associated with tne TTS module and 
the animation module. Trie worxspace 62 facilitates dis- 
play and manipulation of text for lessons for the student. 
The teedbacK area 64 facilitates display and manipula- 
tion of feedback provided to tne student by tne system, 
as will be hereafter described- The animation area 
includes, as snown, an exemplary animated face and 
nead 66a. Last, the control area includes user interface 
control icons such as volume adjustment 68a, speed 
adjustment 68b. a stop button 68c. a play button 68d, a 
pause button 68e, and a record Dutton 68f. The student 
interactively manipulates the window 60 to perform 
functions according to tne present invention. 
[0051] The overall method of tne preferred embodi- 
ment is illustrated in Figure 3. it is to be appreciated that 
tne methods described in connection with Figure 3 ? as 
well as Figures 4-n. are implemented using hardware 
and software techniques that will be apparent to those 
of skill in the art upon a reading of the disclosure hereof. 
[0052] As shown, the method 300 is initiated with 
me input of text (step 302) and subsequent conversion 
ot the input text to audible speech data (step 304). Audi- 
Die speecn is generated and output based on the audi- 
ble speecn data (step 306). Ot course, the audible 
speech can also be represented by a variety of models 
including predictive models, phoneme models, tfphone 



models or dynamically generated models. These mod- 
els are generated primarily by me ASH module and 
associated elements. However, in certain circum- 
stances the TTS module may also be used to generate 
5 the acoustic mooeis, wnen desired by tne student, an 
animated image of a human face and nead is men gen- 
erated pnmarily by the facial animation module 34 (step 
306) and the audible speech and animated image are 
synchronized (step 510). A student is tnen prompted to 
10 replicate the audible speech with spoken words, or 
utterances (step 312). The system then recognizes the 
utterances of the student (step 314) and compares 
tnese utterances to the audible speech data primarily 
through the module 32 (including portion 32a) (step 
1S 3t6). Feedback is men provided to the student based 
on the comparison and a confidence measure mhicn is 
correlated to customized scoring tables and used as a 
calibration point, as is known in the art, in a variety of 
manners as described below (step 318), The feedback 
2o preferably reflects the precision at when the user repli- 
cates the audible speech in the selected language. 
[0053] Figure 4 provides a more detailed descrip- 
tion for steps 302, 304, and 306, More particularly, a 
submethod 400 includes the selection ot input texi (step 
25 402) fottoweo by retrieval of the text using either a uni- 
versal Resource Locator (URL) or a stored filed (step 
404) If a URL is used, the URL is typed into the field 
and the texi is retrieved (step 406), It me text is stored in 
a tile, me tile is selected (step 408). In either event, the 
30 retrieved text Is displayed in the workspace 62 (step 
410). The play button 68d is then pressed, or 'clicked 
on* (step 412). A extermination is made whether tne 
selected text originated from a source located using a 
URL or a file (step 41 4). it the texi originated by way of 
3$ a URL, me markup language is preprocessea (step 
416) The text may oe preprocessed to present "ideal 
form for TTS processing, for example, removing any 
markup language, textual illustrations, parsing known or 
provisioned formats such as email, faxes, etc. in either 
40 case, a subset of the text is then prefetched (step 41 8) 
and text^o-speech processing is initiated (step 420). 
Optionally, me speed and volume of the speech e 
checked (steps 422 and 424). The sound is then played 
(step 426) and a determination « made whetner tne 
4$ playing of tne audible speech is complete (step 428), It 
tne playing of me audible speech is not complete, steps 
413 to 428 are repeated- If me playing of tne autfble 
speech is completed, the process ends (step 430). 
[0054] in a situation where animation is used (e-9- a 
so teacher prompt), a detailed desertion ot steps 302 
through 310 of Figure 3 is shown in Figures 4 and 5 For c 
brevity, submethod 500 simply replaces steps 4i8 to 
430 of Figure 4, 

[0055] Referring now to Figure 5. after me p*ay but- 
5S ton 68d is pressed, the subset of the selected text s 
prefetched (step 502) Text-to-speecn processing is 
ttten initiated (step 504). Text animation processing is 
also initiated (step 506). The speed and volume are 
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tnen checked (steps 508 ana 510) ana aalusted it nec- 
essary The sound and facial movements are output to 
the user in the animation area 66 (steps 5i2 ana 5i4) 
A determination is men made whetner tne playing ot tne 
audiDie speech witn animation is completed (step 5i6). 
IT not complete, steps 502 to 51 6 are repealed. If piay- 
,ng of ine audible speecn witn me animation is com- 
plete, tne process is ended (Step 5iB)- 
[0056] Retemng back k> Figure 3 r the student is 
prompted at step 312 to repftcate tne audible speecn 
played, in xn'is regard, tne studeni may be actively or 
passively prompted by tne system to repeat tne 
teacners example, Methods of prompting include a 
moving cursor, moving highlighted area of text, or mov- 
ing icon. Audible prompts may be used including but not 
limited to: stating vepeat after me" and men stating tne 
word to oe repeated. The speed of me prompt is also 
adjustable. 

[0057J The student may choose to record his or ner 
attempts at speech during a lesson The student can lis- 
ten to the teacher and his or ner recording for a side-by- 
side comparison. The recording can also be used by me 
Automated Speecn Recognition function to determine 
me students performance as will be described below. 
[Q058] As shown in Figure 6, me recording method 
is initiated by selecting (or pressing or "clicking on") the 
piay button 68a and record button 68f (step 602). A 
determination is men made whether tne sought text file 
is stared or should be retrieved by way of a URL (step 
604). if a URL is involved, me markup language must be 
preprocessed (step 606). The text is men pretmcned 
(step 608) and hignlighted text is particularly selected 
(step 610). in either case, text-to-speech processing is 
initiated (step 612) and men a determination is made 
whether animation should be used (step 614). If anima- 
tion is desired, me animation data is processed (step 
6i6). Wnetner me animation is processed or not, tne 
speed and volume are cnecked {steps 6i8 and 620). 
The sound is men played along with me animation, if 
desired (steps 622 and 624). A determination ts made 
whether the playing is done (step 626). it the process is 
not complete, steps 606-626 are repeated 
[0059] If play mg is completed, as shown in Figure 7, 
a prompt to the student is made by me system (step 
702) Text for student replication is highlighted (step 
704). The speed is checked (step 706). Recording of 
the student is begun (step 708) and the cursor is moved 
at a designated speed (step 7i0). A determination Is 
then made wneiherme process is complete (step 7i2). 
It not. steps 702 to 712 are repeated, if me process ot 
recording is complete, me process is ended (step 714). 
[0060] Referring back to Figure 3, the system rec- 
ogni2es me utterances, compares mem to the audible 
speech tiles or records for which models can be gener- 
ated, and provides feedback (steps 314, 316, 31 B). Witn 
respect to me step 316, as will be appreciated by mose 
skilled m me art, me comparison could occur between 
the utterances and any of me following: audible speecn, 



any acoustic model, predictive module, phoneme mod- 
els, diphone models, or dynamically generated models. 
The feedback may be provided m a variety of manners 
One form ot feedback is to allow tne student to playback 
5 me lesson. 

[0061] Referring now to Figure 8, a playback 
method is shown, in mis regard, tne metnod 800 first 
makes a ^termination whether a teacner is selected or 
a student is selected Tor playback (step 602). It a 
70 teacner is selected, text is mgnkghted (step 804). speed 
and volume are checked (steps S06 ana 808), text-to- 
speecn ana process animation data is processed (steps 
810 and 812) and me sound is played and an.rnation 
moved (steps 814 and Si 6) A determination is men 
15 made whether me playback is complete (step 8i8) if 
not, Sicps B04 to ei 8 or© repeated, if me process is 
complete, it »s terminated (step 820). 
[0062] If a student is selected at step 802. me text 
to be played back is highlighted (step 822). The speed 
20 and volume are men checked (steps 824 and 826) The 
sound, or recorded utterances of me student is played 
(step 828). A determination is men made as to whether 
playback is complete (step 830). if not, steps 822 to 830 
are repeated- if the student playback is complete, me 
25 process is ended (step 832), 

[0063] The student may also select to be evaluated 
to see how closely his or ner pronunciation matches me 
teachers model pronunciation. Automated Speech 
Recognition (utterance verification) and verbal informa- 
30 tion Verification (V|V) are used througn modules 30, 32 
and 32a (and associated elements) to determine accu- 
racy in pronouncing words, word segments, sentences, 
or groups of sentences, in a preferred form, utterance 
verification would identify plosives sucn as 'P* and *B" 
35 orT and "D*. Scoring me accuracy includes but ts not 
limited to: gross score tor overall performance, score on 
individual sentences, score on individual words, and 
score on phonemes. 

[0064] Such feeqoack to tne student takes several 
-»o forms and may be used to score performance, deter- 
mine reintorcement, determine feature levels of me 
application (novice, intermediate, advanced). Feedback 
may be given explicitly or through a file sent to a teacher 
through output file storage 26 (Figure 1 ), or bom. 
45 [Q065] Overall scores include numeric values (for 
sentence groups, sentences, words, and sub-words) 
calibrated to account for the students level such as nov- 
ice, intermediate, expert, year ot study associated with 
a syttabus to oe used as a reference lite, or other. The 
so system may be set to display or not to display ins score 
information to me student in the feedback area 64. Tne 
application can determine students level tnrougn statis- 
tical information contained within me system or availa- 
ble over a network and student specific information 
55 collected while the student is interacting wilh me sys- 
tem, or by studeni level seit-idemincation, or by teacner 
or therapist provisioning, icons may be used to indicate 
level ot performance for the student feedback including 
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but noi limited to a series of symbols such as stars, cir- 
cles exc. arranged in groups, ana any of me many sym- 
bols' frequently used to indicate successful task 
complexion. An example would be to nave three circles 
available. Wnen me student needs some improvement, 
two would be shown. When me student needs more 
improvemenno match tne model, only one Circle would 
pe snown. When the student is successful m closely 
matching me model (based on pre-deiermined student 
level) all circles would be d'eplayed. Color may be used 
to indicate level oT performance. 
[0066] FeedoacK on performance may be given 
while the student is reading the xext or after task com- 
pletion. White the student is reading me text, me Verbal 
information Verification processing (or utterance verifi- 
cation processing) can be used to display real-time per- 
formance feedback. The system may use any number ot 
graphical or audio cues to indicate performance includ- 
ing but not limited to bars, icons, colors, sound effects, 
or TTS text files. The system wtll indicate to the student 
that there is a problem and will help tne student to 
decide if he or she should repeal me task, change me 
speed, move to another mode or feature such as word 
building, or listen to me teacher example again. Default 
options will be established based upon the specific per* 
tormance issue and will be determined Dy, for example, 
toe V |V feature, 

[0067] Once a student nas been practicing for some 
penod of time, he or she can again request feedback. A 
summary can be created to provide feedback to the stu- 
dent including but not limited to highlignteq words within 
text, overall scores, discrete scores for segments of 
work, icons to illustrate overaJJ achievement, and audi- 
ble feedback indicating performance. The audible feed- 
back can be a sound effect such as actteenng crowd, or 
a can sound wnen a cursor is moved over a word mat 
was not pronounced wen. The student can play pack 
and listen to me mooel and the»r own pronunciation for 
comparison. 

[0068] A word and punctuation list can be used to 
practice pronunciation skills, review definitions, ezc. The 
list may be created from a lesson file (e.g. lesson file 
40). from a dictionary or otner reference material, from 
the Internet (e.g. through input link 20), or from a to ot 
sub-words, words, or groups (e.g. stored "in pronuncia- 
tion file 44) pronounced inaccurately py the student. 
One advantage to the system is that combinations of 
words and punctuation impact pronunciation and me 
system can Identify these cases and provide feedback 
and reinforcement for these cases. For any case includ- 
ing for words mat have been mispronounced, me sys- 
tem can arrange me words into an order such as closest 
match through not well matched and as a default, can 
present me items needmg most work at the top of the 
list. Word lists appear m me wording window, in me 
example given, a working window appears in me feed- 
back area 64. The student can use a tool to scroll 
through me list The student can highlight and select a 



word with a voice command or mouse. When the pho- 
neme or word (or group) is highlighted, me teacners 
voice is hearo pronouncing tne word. An optional fea- 
ture on hignl.gnting a suo-word. word, or group is to set 
5 me system to repeat me teachers voice and also me 
student's voice tor side by sloe feedback or to go to tne 
recorded actionary to play a sound file. The student can 
iry to pronounce me word again at this point and get 
feedback When the word is selected, me student can 
10 see a more detailed workspace feature in me window. 
The workspace feature uses language rules to process 
tne sub-word, word, or group and display spelling, punc- 
tuation, stresses (accents), syllables, etc The student 
can select to near me example again and try to pro- 
15 nounce it. The student is scored again, and ft perform- 
ance ic improved and feedback is satisfactory as 
determined by me sujdent or teacher, me word lesson is 
ended- If not. more help may be given by me system. 
[0069] if the student has trouble pronouncing me 
20 word with aucuwe example and feedback, reinforcement 
is offered through me working window 60. Moving me 
cursor over a portion of me delayed sub-word, word, 
or group activates me default feature to pronounce it. 
This feature may be turned off. Selecting me word por- 
25 tion provides reinforcement windows with help for me 
student. An example of reinforcement help includes a 
message saying "Try this ... in me word 'graduate* me \r 
is pronounced wan a ? sound as in 'jar." Wftn a table 
indicating known language rules for pronunciation, text 
30 messages are dynamically created based upon me cir- 
cumstances within me selected sub-word, word, or 
grouping. The student sees me message in me window, 
and also hears the teacher speak me message. 
[0070] Messages are nested by the system. If mere 
35 are multiple linguistic reasons why a match Is not made 
between the model and me student in a particular sub- 
word, worn, or group case, men the messages are pre- 
sented to tne student In an order determined anead ot 
ume.The most likely pronunciation rule is first, men less 
40 likely rules in a determined order. 

[0071] An analysis of known errors of pronunciation 
will be used to assist me student For example, mere 
are known linguistic errors made by Korean students 
swaying English. The Try this.. • system of messages 
45 will include considerations for the user of me system 
and will present instructions most likely to help mat par- 
ticular student based upon student self-identification. 
Text or audible help for this feature may be presented in 
the native language or the target language, a combma- 
50 tion. or bom. For example, me pronunciation files 44 
may include a table ot sub-words and corresponding 
sub-words in another language. For word sound drill, for 
example, when learning a Tirst language (given a stu- 
dent who nativety speaks a second language), sub- 
ss words from the first language may be mapped to sub- 
words in me second language, to illustrate sound alike 
comparison to me student- The sub-word table win also 
be used to locate and dispiay>piay vocabulary words 
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using the sub-word from either language. Another prac- 
tice feature associated with tne workspace t s an option 
to list practice sub-words, words, or groups in a window, 
and permit practice of sounds reiaung to me specific 
proplem encountered by me student. An example would 
oe to nlghlignttne area needing practice, such as -at." a 
llSt w0u | O oe displayed with words containing this com- 
omation such as "balk/ Talk/ and 'walk.' Tne teacner 
would read tne example, and ihe student could practice 
tne words. Tnen me student could return to the ordinal 
sup-word, word, or group being drilled, and continue to 
practice 

[0072] The student can review the lesson m any 
mode including teacher example, example and prompt, 
example, prompt and record, example, prompt, record, 
and compare. 

[0073] The student lessons may be presented in a 
graphic illustration. The student can 200m in for further 
detail The student can navigate around the content and 
highlight and select an area or detafl to be studied. The 
student performance may be presented in telescoping 
graphical representations to provide access to all or 
portions of me session completed The student can 
zoom in and refine skills, review complete sessions, or 
select portions. Higher levels will oe illustrated with less 
detail. Zooming in Will provide smaller pieces with more 
detail. Tne student can decide where m me lesson to 
oegin from this overall graphical representation. 
[0074] As to scoring and evaluation of me perform- 
ance of a student, a variety of techniques and opera- 
tions may be incorporated into me system. Scoring 
techniques are well known in the arc however, in a pre- 
ferred form, customized sconng tables are generated 
with confidence scores as calibration points, as is 
known in automated speech recognition practice, por 
example, a series of sentences may oe provided which 
represent tne basic inventory of phonemes ana 
dlpnones in a language. The student will read tne sen- 
tences and they will oe recorded. The sub-words w'M be 
analyzed to determine baseline score or starting per- 
formance of the student This may oe used to determine 
progress, to establish a level for exercises, or to identify 
areas to work on. A taple of reference scores may also 
pe provided for grade levels in language classes g>ven 
populations of students. The student progress can be 
measured and graded on an individual basis or as com- 
pared with me population of choice. 
[0075] Scores for a stuoenx speech are provided in 
suo-words, words, sentences, or paragraphs Students 
may receive an overall score, or a score in individual 
parts of the speech. 

[0076] Normalization issues regarding verification 
of speech are managed through an interface. Given 
speech of differing duration, and complexity, me anr 
mated cursor on me screen can be set by the system or 
Dy me student. When the student reads along with me 
animated cursor, the verification process can correlate 
the text which is highlighted wtm tne sound file to be 



analyzed. 

[0077] Certain recorded sounds can also pe inter- 
jected tor emphasis of natural sound tor known suu- 
words or words of a given language. These words might 
5 pe taken from previously recorded dictionary, applica- 
tion, or other resource. 

[007a] Baseline scores can men oe recorded in a 
taole (such as snown in Figure i at 52). Tne table 52 
may tane a vanety ot forms and is used to determine an 
70 appropriate level of a lesson or grading approach to pe 
selected for tne student. Witft this table, the system can 
automatically use the same text, content, etc. for stu- 
dents of drfterent abilities by modifying thresholds ot 
confidence measurement. 
75 [0079] The student can also use a graphical user 
interface to establish or modify thresholds tor me confi- 
dence measurement, grade level, or other attributes. To 
track his or her progress, me student registers an iden- 
tification number, baseline score, and subsequent les- 
so son scores to achieve customized lessons and to track 
progress. 

[0080] More specifically. Figure g illustrates a pre- 
ferred method for a student to so register. The method 
900 is initiated by me entry of an identification number 
25 by me student (step 902). A student grade level evalua- 
tion process is men completed (step 904). A score is 
recorded (step 906) and a subsequent lesson is 
selected (step 908). The selected lesson is scored (step 
910) and me students record is updated (step 912). 
so [00811 Refemng now to Figure 10. me stuoent 
gade level evaluation process of step 904, for example, 
is detailed as a method 1 000. in this process, a first par- 
agraph is displayed (step 1 002). The student reads me 
first paragraph (step 1004)- A confidence score is 
35 measured by me system (step 1006). Grades are pro- 
vided tor me total paragraph as wen as suo-paragraph 
elements (step 1008). The scores are compared to 
omer scores using a table lookup (step 1010) to deter- 
mine if a predetermined threshold fe exceeded (step 
1012). If the threshold is exceeded, men a second par- 
agraph is displayed for evaluation or, if tne threshold is 
not exceeded, the grade level is simply displayed (step 
1014). 

[0082] If the second paragraph is displayed, me stu- 
*5 dent reads me second paragraph (step 1018) and a 
confidence level is measured (step 1018). A total para- 
o^pn and sub-paragraph score is obtained (step 1020) 
and again, a table lookup is used to determine me 
paoe or score (step 1022) Tne steps are repeated until 
so the score obtained from tne table lookup does not 
exceed me predetermined Threshold (step 1024). 
[0083] Referring now to Figure n , a scoring exam- 
pie n 00 for a lesson selected and scored in steps 908 
and 910 is illustrated The exemplary lesson has a 
ss score ot ninety percent (90%) tor an elementary level 
studem. First, me student is requested to recite a sen- 
tence tor which scores are given for each word of me 
sentence as illustrated (step 1102). These scores for 
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each word identity woras for which lessons should be 
given to the student, in the example shown, me worn 
■fox' only received 9 50 percent score so me student is 
turtner tested on the word 'to*- (step 1104). The stu- 
dents pronunciation of me letters ot the word "To** are 
then given scores and, in tne example, the T and V 
sounds are determined to require further lessons (step 
1 1 06) Elementary vocabulary words with the t ' and 
sound are respectively selected tor lessons (step 1 1 08) 
The same operation ot steps 11 04 to 1108 is repeated 
for other words or sounds that were given low scores 
(e.g. 'jumps*. "la2y- and 'dog's') in the initial sentence 
instep 1102 (step inO). A variety of words are then 
drilled in the lesson, including each identified sound 
(step 1112), It necessary, recorded sounds from the dic- 
nonary are played as model sounds tor tne student (step 
1114) The lesson is then scored and a table is created 
for lesson evaluations (step 1116). 
[0084] The system and method according to the 
present application provides many advantageous fea- 
tures and applications Functions described above are 
combined to create feature rich applications tor learn- 
ing. The system includes scripted lesson combinations 
to enable any student to use the system wttn ease. With 
expenence. tne student or teacher can arrange tor cus- 
tomized combinations of functions to help a specific stu- 
dent learning 'issue or learning style (and for creating 
individualized plans for compliance witn PU94-1 42). The 
system will also recommend feature combinations 
based upon scores of the student and available func- 
tions associated wttn improving student stalls. The sys- 
tem includes files witn lessons tailored to helping 
Students learn the Dashes ot pronunciation for tne target 
language, in addition, tables of references for tne lan- 
guage, grammar, spelling, syntax, and pronunciation 
rules are included with their associated help messages 
tor reinforcement learning. Tne flexibility of this system 
makes it an ideal tool for me classroom and Tor the adun 
student. 

[0085] Directed learning expenence - The Dasic 
system feature ts to demonstrate language py process- 
ing text to speech and piaytng mat speech to the stu- 
dent. The student can become familiar with the sound of 
the language to he learned. The student can listen to 
the examples and learn about pronunciation. 
[0086] listen to any word - With Text-T>Speech 
technology, the student can learn to imitate tne lan- 
guage sound even when a native speaner is not availa- 
ble. Availability of recorded samples, lessons, etc. and 
the availability of a dedicated nanve speaker are con- 
strained resources for students studying English, tor 
example, in many environments. Wrm Text-To-Speech. 
those constraints are eliminated All materials on tne 
web, any text file, and any prepared lesson becomes a 
multi-media language lesson. Any automatically gener- 
ated text file may be used to create up-to-the-minute 
language lessons with this system. For example, oy col- 
lecting closed captioning text from any movie, television 



or news program, text files may be created that can be 
used by the functions of the system as content 
[0087] Ustenmg comprehension - Tne pasic system 
feature ot processing Text-To-Speech provides opportu- 
5 nities tor a student to practice listening comprehenston 
skills without requiring trie production of special content, 
and without requiring another person to be present in 
This case, tne text may be nidden to improve tne per- 
formance of the feature. 
to [0088] Example ana prompt - By combining Text- 
To-Speech processing of me text, with the Facial Ani- 
mation, an example is created for the student. Another 
feature ot the system adds a prompt for me student to 
repeat ihe example. The student can use this feature ot 
75 the system to hear an example and then practice with- 
out feeing recorded, graded or receiving feedback from 
the system. 

[0089] Example, prompt, record - The system can 
combine three functions to provide a means for the stu- 
20 dent to listen to the example, hear and or see a prompt 
ot when to read and what to read, and to record his or 
her own efforts to say the sub-word, word, or phrase, 
[0090] Example, prompt, record, playback - Tne 
system can combine functions to provide a means for 
25 the student to listen to the example, hear and or see a 
prompt, record speech, and then play back the example 
and the student speech for stde by side comparison by 
tne student 

[0091] Self selected reinforcement - tf the student 
30 identifies a problem wttn a particular recorded sample 
and determines thai help is needed, the student can 
access context specific help which is described In the 
function section workspace section. The student has 
accessed a help system that can identify the rules of the 
35 language associated with the word highlighted and can 
present tne Try tnis... "senes in a predetermined order 
based upon Known student errors in the general popula- 
tion or m the group with which the student te iaeraJfted. 
The student can view and listen to some or an of tne 
40 reinforcement help messages. 

[0092] Example, prompt, record, playback, com- 
pare, display results - One of the comprehensive fea- 
tures ot this system includes the combination of the 
teacher example with audio and visual, options cf atter- 
«s ing appearance of tne teacher, options of altering the 
sound characteristics of the teacher, seeing and or 
heanng a prompt, recording speech, allowing for play- 
back to near performance, using Automated Speech 
Recognition to process the students spoken words, 
so obtain a metric for performance, and tnen to display that 
performance within a flexible and adaptable frame of 
reference, 

[0093] Grammar skills - With the addition of a word 
processing component, the language tutor can teach or 
55 reinforce grammar skills. The student can listen to a text 
passage and question, formulate an answer and speak 
or type the answer into the system. The results of the 
word processing program will generate examples ot 
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errors m sentence syntax, etc. tnat will be used by tne 
system to recommend Try this - examples based 
upon Known rules ot the language Propiem areas will 
De ni9hhgmea as described above, and the student can 
use the working winoow to practice skills. Lessons on 
typical pronunciation issues Tor speakers or Asian lan- 
guages *nen learning c"nglisn are included in the sys- 
tem. 

[0094] several f unctions in tne system may be com- 
Dined to present lesson materials to the student By 
ecmpming several functions, the teacher can control the 
lesson plan, methods of teaching, and student feed- 
pack This provides significant flexibility in the system 
and puis tne user 'in control. indfviduaJ&ed Educational 
Plans can De easily constructed for students making 
compliance with PL94-142 sropie for the teacher. An 
important feature of the system combines functions of 
Texr-To- Speech and Facial Animation as a visual aid in 
pronunciation and typical facial, mouth, and tongue 
movements associated with speecn using real exam- 
ples from the lessons. This feature is valuaple to stu- 
dents studying a language other than their native 
language and also to students working with a speech 
tnerapist. 

[0095) Special interest or suDject content might 0e 
desired in this circumstance. For example, an employee 
of a company dealing witn automobile parts or an 
employee of a medical establishment might want to use 
content from the company literature to practice listen- 
ing. Then he or she would be atrte to practice special 
words, phrases, etc. that ne or she might be likely to 
near m their environment, and therefore would be inter- 
ested in understanding. 

[0096] The above description merely provides a 
disclosure of particular embodiments of the invention 
and is not intended tor tne purposes ot l*mmng me same 
thereto. As such, tne invention is not limned to only tne 
above described embodiments. Rather, it s recognized 
mat one sKilied m tne art could conceive alternative 
embodiments that fall within the scope of the invention 

Claims 

1. A system tor interactive language instruction for a 
user comprising. * 



on a comparison of the utterances to one of the 
audible speech and the model. 



2. The system as set forth in claim 1 further compris- 
5 mg a third module synchronized to the first module, 
a Third module producing an animated image of a 
numan face and head pronouncing the audible 
speecn. 

io 3. The system as set form in claim 2 wherein tne ani- 
mated image ot me human face and nead portrays 
a transparent face and nead. 

4. The system as set forth in claim 2 wherein tne first 
15 and trora modules further include controls to control 

one of the volume, speed, and vocal characteristics 
of trie video image and the audible speech- 

5. The system as set torch in claim ^ wherein the 
20 model is one of a predictive model, phoneme 

model, a diphone model, and a dynamically gener- 
ated model. 
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6. The system as set forth in claim 1 wherein the first 
module includes files storing model pronunciations 
for words comprising me input text. 

7. The system as set forth m claim l further compris- 
ing lesson files wherein the input text is based on 
data stored in the lesson files. 

8. The system as set forth in claim i wherein the input 
text is based on data received from a source out- 
side of the system. 

9. The system as set form in claim ^ wnerem tne sys- 
tem further includes aictionary files. 



a ttrsi module configured to convert inpuitext to 
audible speech in a selected language, the 
audible speecn being patterned after a model, 

a user interface configured to receive utter- 
ances spoken by a user in response to a 
prompt to replicate the audible speech; ana 

a second module configured to recogni2e tne 
utterances and provide feedback to the user as 
to a precision at which tne user reptcates the 



10. Trie system as set forth in claim i wherein the s y s- 
«o tern Turtner comprises a record and playback mod- 
ule. 

11. The system as set forth in claim 1 wherein tne sys- 
tem further includes a table storing mapping infor- 

45 mation between word subgroups and vocabulary 
words 

12. The system as set forth m claim ^ wherein me sys- 
tem further includes a table storing mapping infor- 

so mation between words and vocabulary words. 

13. Trie system as set forth in claim 1 wherein the sys- 
tem further "includes a table storing mapping infor- 
mation between words and examples of pans of 

ss speecn. 

14. The system as set forth in claim ^ wherein mesys- 
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15. The system as set Tortn in claim 1 wnerein me sys- 
tem includes specific pronunciation files. 

16. A system compnsing: 

a first module configured to convert input text to 
audible speech in a selected language, tne 
audiPte speech indicative of a mode*; 

a second module synchronized to tne first mod- 
ule, tne second module producing an animated 
image of a human face and head pronouncing 
tne audible speecn; 

a user inierface positioned to receive utter- 
ances spoken t>y a user in response to a 
prompt to replicate the audible speech; and. 

a third module configured to recognize the 
utterances and provide feedback to me user as 
to a precision at which the user replicates The 
speecn In the selected language based on a 
comparison of tne utterances to one of the 
audible speech and the model. 

17, A method for voice interactive language instruction 
comprising: 

convening input text data to audible speech 
data: 

generating audible speech comprising pho- 
nemes based on the audible speecn data: 

outputting the audible speech through an audio 35 
output device; 

generating an animated image of a face and 
head pronouncing the audible speech; 

synchronizing ™e audible speech and the 
video image: 

prompting me user to replicate the audible 
speech; 45 

recognizing utterances generated by the user 
m response to the prompting; 

comparing the audible speecn id me utter- so 
ances: and, 

providing feedback to me user based on tne 
comparison. 

18, The method as set forth in claim 17 further compris- 
ing receiving the input text from one of a networK, a 
stored lesson file, a scanner, and the internet. 



19 The metnod as set forth in claim 17 wherein the 
* feedback comprises providing a playback of 
selected portions ot the audible speech and utter- 
ances 
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FIG. 2 
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